Greedy-layer pruning: Speeding up transformer models for natural language processing

نویسندگان

چکیده

Fine-tuning transformer models after unsupervised pre-training reaches a very high performance on many different natural language processing tasks. Unfortunately, transformers suffer from long inference times which greatly increases costs in production. One possible solution is to use knowledge distillation, solves this problem by transferring information large teacher smaller student models. Knowledge distillation maintains and compression rates, nevertheless, the size of model fixed can not be changed individually for given downstream task use-case reach desired performance/speedup ratio. Another reduce much more fine-grained computationally cheaper fashion prune layers pre-training. The price pay that layer-wise pruning algorithms par with state-of-the-art methods. In paper, Greedy-layer introduced (1) outperform current pruning, (2) close gap when compared while (3) providing method adapt dynamically tradeoff without need additional phases. Our source code available https://github.com/deepopinion/greedy-layer-pruning.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speeding Up FastICA by Mixture Random Pruning

We study and derive a method to speed up kurtosis-based FastICA in presence of information redundancy, i.e., for large samples. It consists in randomly decimating the data set as more as possible while preserving the quality of the reconstructed signals. By performing an analysis of the kurtosis estimator, we find the maximum reduction rate which guarantees a narrow confidence interval of such ...

متن کامل

Connectionist Models for Natural Language Processing Program

The scientific adequacy of models based on a small number of coarse-grained primitives (e.g. conceptual dependency), popular in AI during the 70's, has been called into question and substantially replaced by a current emphasis in much of computational linguistics on lexicalist models (i.e., ones which use words for representing concepts or meanings). However, few people can doubt that words are...

متن کامل

Maximum Entropy Models for Natural Language Processing

متن کامل

TFLEX: Speeding Up Deep Parsing with Strategic Pruning

This paper presents a method for speeding up a deep parser through backbone extraction and pruning based on CFG ambiguity packing.1 The TRIPS grammar is a wide-coverage grammar for deep natural language understanding in dialogue, utilized in 6 different application domains, and with high coverage and sentence-level accuracy on human-human task-oriented dialogue corpora (Dzikovska, 2004). The TR...

متن کامل

Speeding up LFG Parsing Using C-Structure Pruning

In this paper we present a method for greatly reducing parse times in LFG parsing, while at the same time maintaining parse accuracy. We evaluate the methodology on data from English, German and Norwegian and show that the same patterns hold across languages. We achieve a speedup of 67% on the English data and 49% on the German data. On a small amount of data for Norwegian, we achieve a speedup...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Pattern Recognition Letters

سال: 2022

ISSN: ['1872-7344', '0167-8655']

DOI: https://doi.org/10.1016/j.patrec.2022.03.023